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The  answer  to  the  above  question  depends  strongly  on 
whether  or  not  analytical  expressions  for  the  components  of 
the  gradient  and  the  elements  of  the  Hessian  matrix  are 
available.  It  also  depends  on  the  relative  importance  of 
the  computational  effort  associated  with  algorithmic  opera- 
tions vis-a-vis  the  computational  effort  associated  with 
function  evaluations,  "v 

Both  theoretical  considerations  and  extensive  numerical 
examples  carried  out  in  conjunction  with  the  Fletcher-Reeves 
algorithm,  the  Davidon-Fletcher-Powell  algorithm,  and  the 
quasilinearization  algoi/ithm  suggest  the  following:  the 


N concept,  while  accur^e  in  some  cases,  has  drawbacks  in 

® / 

/ 

other  cases;  indeed,  ib  might  lead  to  a distorted  view  of 
the  relative  import^ce  of  an  algorithm  with  respect  to 
another . , 

The  above  distortion  can  be  corrected  through  the  in- 
troduction of  a more  general  parameter  N^.  This  generalized 
parameter  is  constructed  so  as  to  reflect  accurately  the 
computational  effort  associated  with  function  evalua^ons 
anti  algorithmic  operations. 

-^From  the  analyses  performed  and  the  results  obtained,  it 
is  inferred  that,.— ddS'^to  the  weaknesses  of  the  concept, 
the  use  of  the  concept  is  advisable.  In  effect,  this  is 
the  same  as  stating  that,  in  spite  of  its  obvious  shortcomings. 
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the  direct  measurement  of  the  CPU  time  is  still  the  more 


reliable  way  of  comparing  different  minimization  algorithms. 


Key  Words.  Numerical  analysis,  numerical  methods,  computing 
methods,  computing  techniques,  complexity  of  computation, 
philosophy  of  computation,  comparison  of  algorithms,  computa- 
tional speed,  measurement  of  computational  speed,  number  of 
function  evaluations,  equivalent  number  of  function  evalua- 
tions, unconstrained  minimization,  mathematical  programming. 
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1.  Introduction 

Over  the  past  two  decades,  a large  number  of  mathematical 
programming  problems  have  been  studied  both  analytically  and 
numerically.  Generally  speaking,  these  problems  belong  to 
four  principal  categories:  (i)  unconstrained  minimization 
problems,  (ii)  constrained  minimization  problems  involving 
equality  constraints,  (iii)  constrained  minimization  problems 
involving  inequality  constraints,  and  (iv)  constrained  mini- 
mization problems  involving  both  equality  and  inequality  con- 
straints. 

For  each  category  of  problems,  three  types  of  methods 
have  been  developed,  more  specifically;  (a)  zeroth-order 
methods,  (b)  first-order  methods,  and  (c)  second-order 
methods.  Methods  of  type  (a)  utilize  only  the  functions 
under  consideration  and  avoid  the  computation  of  derivatives. 
Methods  of  type  (b)  utilize  the  functions  under  consideration 
and  their  first  derivatives.  And  methods  of  type  (c)  utilize 
the  functions  under  consideration  together  with  their  first 
and  second  derivatives. 

For  each  category  of  problems  and  each  method,  several 
classes  of  algorithms  have  been  developed.  As  an  example, 
with  reference  to  the  category  of  unconstrained  minimization 
problems  and  first-order  methods,  the  following  classes  of 
algorithms  are  available  today:  ordinary-gradient  algorithm. 
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conjugate-gradient  algorithm,  variable-metric  algorithm, 
memory-gradient  algorithm,  and  supermemory-gradient  algorithm. 

It  is  clear  that  a bewildering  combination  of  problems, 
methods,  and  algorithms  exists  and  that  the  proliferation  of 
these  algorithms  is  bound  to  cause  some  confusion  in  the 
user,  that  is,  the  engineer,  the  chemist,  or  the  economist 
who  must  solve  problems  of  the  real  world.  Faced  with  a 
given  technical  problem,  the  user  would  like  to  know  an 
answer  to  the  following  question:  what  kind  of  algorithm 
should  be  selected  to  solve  the  problem  under  consideration? 

Unfortunately,  no  clear-cut  answer  can  be  given  to  the 
above  question.  Nevertheless,  the  identification  of  poten- 
tially successful  algorithms  can  be  facilitated  if  the  de- 
veloper of  an  algorithm  supplies  sufficient  information 
about  the  following  items;  (A)  algorithm  robustness  or 
convergence  range;  (B)  convergence  rate;  (C)  computational 
speed;  (D)  memory  requirements;  and  (E)  programming  com- 
plexity. Only  for  simply-structured  problems  (for  instance, 
linear-quadratic  problems) , the  above  information  can  be 
predicted  theoretically.  For  more  general  problems,  the 
help  of  computer  experimentation  is  nearly  indispensable. 

In  this  study,  we  are  concerned  with  Item  (C) , the 
measurement  of  computational  speed.  In  particular,  we 
examine  critically  the  concept  of  equivalent  number  of 
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function  evaluations  and  inquire  whether  this  quantity 
constitutes  a fair  way  of  comparing  different  minimization 
algorithms.  Then,  we  introduce  a more  general  parameter 
Ng  , which  is  constructed  so  as  to  reflect  accurately  the 
computational  effort  associated  with  function  evaluations 
and  algorithmic  operations.  Next,  we  examine  the  con- 
cept vis-a-vis  the  concept  through  several  numerical 
examples.  These  examples  include  some  widely  used  test 
functions  (Rosenbrock,  Wood,  Powell,  and  Miele  functions, 
generalized  Rosenbrock  function,  and  so  on)  and  some  widely 
used  minimization  algorithms  (Fletcher-Reeves  algorithm, 
Davidon-Fletcher-Powell  algorithm,  and  quasilinearization 
algor  ) . 

■strained  Minimization.  For  the  sake  of  simplicity, 
w sj-der  in  the  following  sections  only  one  category  of 

problems,  namely,  unconstrained  minimization  problems.  More 
specifically,  we  consider  the  function 

f = f(x)  , (1) 


where  f is  a scalar  and  x is  an  n-vector  whose  components 

are  unconstrained.  We  denote  by  g(x)  the  gradient  and  by 

H(x)  the  Hessian.  We  observe  that  the  gradient  vector  has 

2 

n components  and  that  the  Hessian  matrix  has  n elements. 
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Of  these  n elements,  only 

m = n(n+l)/2  (2) 

need  to  be  calculated,  owing  to  the  symmetry  of  the  Hessian 
matrix. 
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2 . Measurement  of  Computational  Speed 

The  methods  employed  for  measuring  the  computational 
speed  of  different  minimization  algorithms  can  be  grouped 
into  two  classes:  (i)  direct  measurement  and  (ii)  indirect 
measurement. 

Direct  Measurement.  The  most  direct  way  for  evaluating 
the  computational  speed  of  an  algorithm  is  to  measure  the 
so  called  CPU  time  (the  symbol  CPU  stands  for  central  pro- 
cessing unit) . The  main  advantage  of  this  quantity  is  that 
it  includes  both  function  evaluation  time  and  algorithmic 
time.  The  main  disadvantage  is  that  the  CPU  time  is  machine 
dependent  as  well  as  operator  (programmer)  dependent.  The 
above  difficulties  can  be  removed  to  some  degree  if  the  com- 
parison of  different  algorithms  is  done  on  a single  computer, 

with  the  same  programming  language,  with  the  same  compiler 
with  the  same  subroutines,  under  similar  workload  conditions 
of  the  computer,  and  by  the  same  programmer.  In  other  words, 
it  is  essential  that  the  same  experimental  conditions  be  kept 
for  all  of  the  algorithms  being  investigated.  For  an  ex- 
cunple  of  comparative  experiments  done  under  these  conditions, 
see  Refs.  1-2. 

Normalized  Time.  In  an  attempt  to  make  the  direct  measure- 
ment of  the  CPU  time  independent  of  the  particular  computer, 
Colville  introduced  in  Ref.  3 the  concept  of  normalized  time: 


Here,  T is  the  CPU  time  required  to  solve  a particular  test 

problem  with  the  algorithm  under  consideration,  and  T is 

s 

the  CPU  time  required  to  execute  a so-called  standard  pro- 
gram, devised  by  Colville.  This  standard  program  consists 
of  inverting  a 40x40  matrix  ten  times.  Ideally,  this  para- 
meter should  be  machine  independent.  In  practice,  it  might 
still  depend  on  the  subroutines  being  used. 

Indirect  Measurement.  There  exist  three  major  ways  for 
evaluating  the  computational  speed  of  an  algorithm  indirectly: 
(a)  number  of  iterations;  (b)  number  of  function,  gradient, 
and  Hessian  evaluations;  and  (c)  equivalent  number  of  function 
evaluations.  All  of  these  quantities  are  machine  independent, 
operator  independent,  and  simple  to  compute.  However,  their 
use  implies  the  drawbacks  discussed  below. 

Number  of  Iterations.  Generally  speaking,  the  time  per 
iteration  varies  from  one  algorithm  to  another.  Therefore, 
one  cannot  employ  the  number  of  iterations  N as  an  indicator 
of  computational  speed,  unless  one  can  be  reasonably  sure 
that  all  of  the  algorithms  being  compared  require  approxi- 
mately the  same  workload  per  iteration.  For  an  example  where 
this  situation  arises,  see  Ref.  4. 

Number  of  Function,  Gradient,  and  Hessian  Evaluations. 

If  one  can  be  reasonably  sure  that  the  algorithmic  time  is 
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negligible  by  comparison  with  the  function  evaluation  time, 
one  can  use  the  triplet  composed  of  number  of  function  evalua- 
tions (Nq)  , number  of  gradient  evaluations  (N^^)  , and  number 
of  Hessian  evaluations  (N2)  as  a collective  indicator  of 
computational  speed.  The  trouble  is  that  the  resulting  in- 
dication is  unclear,  unless  one  is  willing  to  attribute  rela- 
tive weights  to  function,  gradient,  and  Hessian  evaluations. 

Equivalent  Number  of  Function  Evaluations.  Let  the  rela- 
tive weights  (l,n,m)  be  attributed  to  the  elements  of  the 
triplet  (Nq,  N2)  . VVith  this  understanding,  one  can  form 

the  following  linear  combination: 

Ng  = Nq  + nN^  + mN2  , (4) 

which  is  called  the  equivalent  number  of  function  evaluations. 
This  parameter  can  be  used  as  an  indicator  of  computational 
speed  providing  the  algorithmic  time  is  negligible  by  com- 
parison with  the  function  evaluation  time  and  providing  the 
weights  (l,n,m)  measure  correctly  the  relative  importance  of 
function  evaluation,  gradient  evaluation,  and  Hessian  evalua- 
tion. However,  this  depends  on  whether  analytical  expressions 
for  the  components  of  the  gradient  and  the  elements  of  the 
Hessian  matrix  are  available  or  not. 
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3 . Standard  Definition  of  Equivalent  Number  of  Function 
Evaluations 

Assume  that  some  particular  algorithm  is  employed  in  order 
to  obtain  the  minimum  of  the  function  (1)  on  a digital  computer. 
The  total  CPU  time  T can  be  written  as 


T 


'^a  ^ 


(5) 


Here,  is  the  algorithmic  time,  namely,  the  CPU  time  required 
to  perform  the  arithmetic  operations  intrinsic  to  the  algo- 
rithm being  employed.  And  is  the  function  evaluation  time, 
namely,  the  CPU  time  required  to  evaluate  the  function,  the 
gradient,  and  the  Hessian.  Therefore,  T^  can  be  written  as 


T 

e 


+ T^  t T^ 


(6) 


Here,  Tq  denotes  the  CPU  time  associated  with  function  evalua- 
tions, Tj^  denotes  the  CPU  time  associated  with  gradient  evalua- 
tions, and  T2  denotes  the  CPU  time  associated  with  Hessian 
evaluations. 

Let  Tq,  t^,  denote  the  basic  times  required  to  compute 
one  function,  one  gradient,  and  one  Hessian,  respectively. 
Observe  that  the  following  relations  hold; 


T = + TqNq  + + T2N2 


Next,  let  the  following  assumptions  be  employed; 

(Al)  From  the  point  of  view  of  the  CPU  time,  one  gradient 
evaluation  is  equivalent  to  n function  evaluations. 

(A2)  From  the  point  of  view  of  the  CPU  time,  one  Hessian 
evaluation  is  equivalent  to  m function  evaluations. 

(A3)  The  algorithmic  time  is  negligible  by  comparison 
with  the  function  evaluation  time. 

In  equation  form,  the  above  assumption  can  be  rewritten 
as  follows: 


(9-1) 

(9-2) 

(9-3) 
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4 . Cases  Where  the  Assumptions  are  Satisfied 

In  this  section,  we  present  some  cases  where  the  equiva- 
lent number  of  function  evaluations  (11)  can  be  regarded  to 
be  a correct  indicator  of  computational  speed. 

Example  4.1.  Suppose  that  the  function  (1)  is  such  that 
analytical  expressions  for  the  components  of  the  gradient  and 
the  elements  of  the  Hessian  matrix  are  not  available.  Con- 
sequently, some  numerical  approximation  scheme  to  the  gradient 
and  the  Hessian  is  necessary.  For  example,  suppose  that  a 
forward  difference  scheme  is  employed.  Denote  by  e some 
small  number,  and  denote  by  u^  a unit  vector  in  the  x^-  di- 
rection. Then,  the  following  relations  hold; 

f(x  + eu^)  - f(x)  = euTg(x)  = eg^ (x)  , (14) 

where  i = 1,2,.. .,n,  and 

f (x+eu^+eUj ) -f (x+eu^) -f (x+eUj ) +f (x)  = e^uTH(x)Uj  = e^H^ ^ (x) , (15) 

where  i = l,2,...,n  and  j = i,i+l, . . . . ,n. 

If  f (x)  is  known,  Eq.  (14)  shows  that  the  computation  of 
the  gradient  g(x)  requires  n additional  function  evaluations. 

In  turn,  if  f (x)  and  g(x)  are  known,  Eq.  (15)  shows  that  the 
computation  of  the  Hessian  H(x)  requires  m additional  function 
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evaluations.  Clearly,  Eqs.  (14)  and  (15)  illustrate  the 
validity  of  Assumptions  (Al)  and  (A2) . Then,  providing  one 
can  ascertain  that  Assumption  (A3)  is  true,  one  concludes  that 
the  parameter  (11)  is  a correct  indicator  of  computational 
speed. 

Example  4.2.  Suppose  that  the  function  f depends  on  x, 
not  directly,  but  indirectly  through  some  variable  y,  which 
is  a function  of  x defined  by  means  of  some  definite  inte- 
gral. For  simplicity,  assume  that  both  x and  y are  n-vectors. 
Then,  the  situation  is  as  follows; 

f = f(y)  , (16) 


where 


y 


(p  (x,  t)dt 

0 


(17) 


Denote  by 


F(x)  = f(y(x))  (18) 

the  function  obtained  by  combining  (16)- (17)  and  eliminating 
y.  Observe  that 


r ' 
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F = y f 
X ■'x  y 


F =yf+yfy  , 
XX  ■‘xx  y ■'x  yy-'x  ' 


(19) 


where  the  square  matrix  y^  and  the  cubic  array  y^  are  given 


by 


=i: 


(J)^(x,.,  i:  , 


‘ XX 


=f: 


(20) 


Next,  assume  that  some  particular  algorithm  is  employed 
in  order  to  find  the  minimum  of  the  function  F (x) , utilizing 
the  gradient  (19-1)  and  perhaps  the  Hessian  (19-2) . Since 


the  computation  of  F,  F^,  F^^  requires  previous  numerical 


integrations,  defined  through  (17)  and  (20) , this  example 
illustrates  a situation  where  the  validity  of  Assumption 
(A3)  is  plausible.  Then,  providing  one  can  ascertain  that 
Assumptions  (Al)  and  (A2)  are  true,  one  concludes  that  the 
parameter  (11)  is  a correct  indicator  of  computational 
speed. 


7 


. 1 


I; 
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5.  Cases  Where  the  Assumptions  Are  Not  Satisfied 

In  this  section,  we  present  some  cases  where  the  equiva- 
lent number  of  function  evaluations  (11)  cannot  be  regarded 
to  be  a correct  indicator  of  computational  speed. 

Example  5.1.  Suppose  that  the  function  (1)  is  such  that 
analytical  expressions  for  the  components  of  the  gradient  and 
the  elements  of  the  Hessian  matrix  are  available.  In  par- 
ticular, assume  that  the  function  (1)  has  the  quadratic  form 

f (x)  = a + b'^x  + j x'^cx  , (21) 

with  the  implication  that 

g(x)  = b + cx  , H(x)  = c . (22) 

In  (21)-(22),  a,b,c  are  constants  having  appropriate  dimen- 
sions. 

Next,  consider  the  operational  count  associated  with 

function,  gradient,  and  Hessian  evaluations.  Observe  that 

2 

the  computation  of  the  function  requires  (n+1)  multipli- 
cations and  n(n+l)  sums  and  that  the  computation  of  the 

2 . 2 

gradient  requires  n multiplications  and  n sums.  If  we  neg- 
lect the  addition  times  by  comparison  with  the  multiplication 
times,  we  arrive  at  the  following  conclusions  for  the  ratios 


a 
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of  the  basic  times  Tq,  : 

Tj^/Tq  = n^/(n+l)^  , ^ ° 

Hence,  for  n relatively  large,  we  have 

Ti/Tq  = 1 , ® 

Therefore,  it  appears  that,  from  the  point  of  view  of 
the  CPU  time,  one  gradient  evaluation  is  equivalent  to  one 
(not  n)  function  evaluation,  and  one  Hessian  evaluation  is 
equivalent  to  zero  (not  m)  function  evaluations.  As  a con- 
clusion, it  appears  that  Assumptions  (Al)  and  (A2)  are  not 
justified  for  the  quadratic  function  (21) . 

Remark.  The  result  (24-1)  represents  a worst-case  con- 
dition, because  function  evaluation  and  gradient  evaluation 
have  been  regarded  to  be  separate  operations.  Had  one  ac- 
counted for  the  commonality  of  the  product  cx  to  both  f (x) 
and  g(x),  then  Eqs.  (24)  would  have  been  modified  as  follows; 

Ti/Tq  = 0 , ^ ° 

thereby  invalidating  Assumptions  (Al)  and  (A2)  to  an  even 
larger  degree. 
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Example  5.2.  Suppose  that  the  function  (1)  is  such  that 
the  Hessian  matrix  has  a banded  structure.  For  example, 
consider  the  following  generalized  Rosenbrock  function  (see 
Oren,  Ref.  5) : 

f(x)  = (x.-l)^  + 100  ”e^  (X?  - , (26) 

i=l  ^ i=l  ^ ^ ^ 

and  observe  that  the  associated  Hessian  matrix  is  tridiagonal. 
Therefore,  only  those  elements  H^^  which  are  located  on  the 
principal  diagonal  and  on  a contiguous  subdiagonal  need  to 
be  computed,  since  all  of  the  remaining  elements  vanish.  The 
number  of  nonzero  elements  of  the  Hessian  matrix  that  need 
to  be  computed  is 

m = 2n-l  , (27) 

instead  of  m.  Table  1 shows  the  values  of  m,  in,  and  m/m  for 
n ranging  between  5 and  30.  Note  that  the  ratio  m/m  decreases 
as  n increases  and  becomes  of  order  1/10  for  n=30.  Therefore, 
even  if  finite-difference  methods  are  employed  in  the 
computation  of  the  Hessian  matrix,  it  appears  that  Assumption 
(A2)  is  not  justified  for  the  generalized  Rosenbrock  function 
(26)  . 
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Table  1.  Generalized  Rosenbrock  function. 


n 

fh 

m 

m/m 

5 

9 

15 

0.600 

10 

19 

55 

0.345 

15 

29 

120 

0.242 

20 

39 

210 

0.186 

25 

49 

325 

0.151 

30 

59 

465 

0.127 
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6 . New  Definition  of  Equivalent  Number  of  Function  Evaluations 

From  the  examples  of  the  previous  section  , it  appears 
that  there  are  cases  where  Eq.  (11)  cannot  be  regarded  as 
representative  of  the  computational  speed  of  an  algorithm, 
in  that  it  might  overestimate  the  importance  of  the  gradient 
contribution  and  the  Hessian  contribution  to  the  equivalent 
number  of  function  evaluations.  In  addition,  Eq.  (11)  dis- 
regards the  contribution  due  to  algorithmic  operations. 

In  an  attempt  to  correct  the  above  situation,  we  supply 
here  a new  definition  of  equivalent  number  of  function 
evaluations.  Specifically,  we  introduce  a more  general 
parameter  N^  , which  is  constructed  so  as  to  reflect  ac- 
curately the  computational  effort  associated  with  function 
evaluations  and  algorithmic  operations. 

While  we  retain  Eqs.  (5)-(8),  we  replace  the  assumptions 
expressed  by  Eqs.  (9)  with  the  following  definitions; 

(Bl)  = Cj^Tq  , (28-1) 

(B2)  T2  = , (28-2) 

(B3)  = C,T  . (28-3) 

oi  .3  6 

Here,  C2,  are  coefficients  to  be  determinai  experimentally 
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or  theoretically  through  an  operational  count.  With  this 
understanding,  Eq.  (8)  can  be  rewritten  as 


T = t_N  , 
0 e 


where 


Ng  = (1  + C3)  (Nq  + C3N3  + C2N2)  . 


This  expression  constitutes  a new  definition  of  equivalent 


number  of  functions  evaluations. 


Alternative  Definition.  If  one  introduces  the  new 


coefficients 


Kf  = C^/n  , K2  = C^/m  , = I+C3  , 


Eq.  (30)  can  be  rewritten  in  the  alternative  form 


N = K,  (N-  + K.nN,  + K^mN.,) 
e 3 0 11  2 2 


Remark . With  the  terminology  of  this  section.  Assumptions 
(Al) , (A2) , (A3)  can  now  be  restated  as  follows: 


C3  = n or  K3  = 1 , 


(33-1) 


L 

1 


[ 


t 

I 

[ 


I 

f 


importance  of  the  computational  effort  associated  with  gradient 
evaluation  and  Hessian  evaluation  vis-a-vis  the  computational 
effort  associated  with  function  evaluation.  The  coefficient 
measures  the  relative  importance  of  the  computational 
effort  associated  with  algorithmic  operations  vis-a-vis  the 
computational  effort  associated  with  function,  gradient,  and 
Hessian  evaluations.  While  the  coefficients  and  C2  depend 
on  the  nature  of  the  function  f (x) , the  coefficient  depends 
also  on  the  structure  of  the  particular  algorithm  and  search 
technique  employed. 

The  above  coefficients  can  be  determined  through  either 
an  operational  count  or  computer  experimentation. 

By  determining  the  triplet  (C^^,  C^i  C^)  and  the  associated 
triplet  (K^,  K^)  , and  by  measuring  the  deviation  of  these 

coefficients  from  the  idealized  values  (33)  , one  can  supply 
an  answer  to  the  basic  questions  formulated  in  this  paper, 
namely,  those  concerning  the  correctness  of  Assumptions  (Al) , 
(A2),  (A3). 


1 
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7 . Experimental  Conditions  and  Test  Functions 

In  the  following  sections,  we  describe  some  numerical 
experiments,  leading  to  the  computation  of  the  coefficients 
and  for  several  test  functions  and  minimization  al- 
gorithms. 

Test  Conditions.  All  computations  were  performed  on 
the  IBM  370/155  computer  of  Rice  University.  FORTRAN  pro- 
gramming was  employed  in  conjunction  with  double-precision 
arithmetic.  A FORTRAN  Gl  compiler  was  used.  A FORTRAN  TIME 
subroutine  was  used  in  order  to  determine  the  CPU  times.  Note 
that  the  IBM  370/155  computer  of  Rice  University  has  multi- 
programming and  time-sharing  capabilities. 

Test  Functions.  Fourteen  test  functions  were  employed. 
They  are  described  below  in  the  order  of  increasing  dimension. 

Example  7.1,  Rosenbrock,  n=2: 

f = (Xj^-1)^  + 100(x^-X2)^  ' <34 

Excunple  7.2,  Himmelblau,  n=2: 

f = (Xj^+X2-ll)^  + (Xj^+X2~7)^  ; (35 

Example  7.3,  Beale,  n=2: 
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f = (x^-1)^  + 100(x^-X2)^  + (X3-1)^  + 90(x3-x^)^ 


+ 10.1[(X2-1)^  + (x^-l)^]  + 19.8(x2-1) (x^-1)  ; (40) 


Example  7.8,  Jacobson,  n=4 ; 


f = [l/4+b'^x+(l/2)x'^Mx]^  , 


(41-1) 


4.5 

7.0 

3.5 

3.0" 

" -0.5 

7.0 

14.0 

9.0 

8.0 

-1.0 

3.5 

9.0 

8.5 

5.0 

b = -1.5 

3.0 

8.0 

5.0 

8.0 

0.0 

(41-2) 


Example  7.9,  Powell,  n=4 : 


f = (Xj^+10x2)^  + 5(x3-x^)^  + (x2-2x3)^ 


+ lO(Xj^-x^)  ; 


Example  7.10,  Powell,  n=4; 


f = (Xj^+10x2)^  + 5(x3-x^)^  + (x2-2x3)^ 


+ lO(Xj^-x^) 


r 


Example  7.11,  Miele,  n-4; 

f = (expXj^-X2)^  + 100(X2-X2)^ 

A 8 2 

+ tan^(x2-x^)  + x^^  + (x^-1)  ; 

Example  7.12,  quadratic  function,  n=4 : 

f = (Xj^+X2+0.5x4)^  + (Xj^+2x2+X3+x^)^ 

o 2 

+ (x2+X3+1.5x^)^  + (0.5Xj^+X2+1.5x3-0.5) 

Example  7.13,  Bass,  n=5  through  30: 

n ~ n y n ^ 

f = Z xT  + [ Z{  Dx.]'^  + t K i)x.l  ; 
i=l  ^ i=l  ^ i=l 

Example  7.14,  generalized  Rosenbrock  function  (Oren) 
n = 5 through  30: 

n*  1 ^ n“l  2 2 

f = E (x.-l)  + 100  E (x.-x.  ,) 

i=l  ^ i=l  ^ 
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(44) 


(45) 


(46) 


(47) 
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8 . Minimization  Algorithms 

Three  unconstrained  minimization  algorithms  were  em- 
ployed; The  Fletcher-Reeves  algorithm  (FR) , the  Davidon- 
Fletcher-Powell  algorithm  (DFP) , and  the  quasilinearization 
algorithm  (QL) . The  FR  and  DFP  algorithms  exemplify  first- 
order  methods,  and  the  QL  algorithm  exemplifies  second- 
order  methods. 

For  all  of  these  algorithms,  let  x denote  the  nominal 
point,  X the  varied  point,  x the  previous  point,  p the  search 
direction  (an  n-vector) , and  a the  stepsize  (a  scalar).  Then, 
the  step  leading  from  x to  x is  given  by 

X = x-ap  . (48) 

The  specification  of  the  search  direction  and  the  stepsize 
is  given  below. 

Search  Direction.  For  the  FR  algorithm,  the  computation 
of  the  search  direction  is  described  by  the  relations 

p = g(x)+YP  » (49-1) 

y = g'^(x)g(x)/g'^(x)g(x) 


(49-2) 
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I 


For  the  first  iteration,  one  sets 


Y = 0 


(49-3) 


For  the  DFP  algorithm,  let  M denote  a symmetric  and 
positive-definite  matrix  (an  approximation  to  the  inverse  of 
the  Hessian  matrix) . Then,  the  computation  of  the  search 
direction  is  described  by 


p = Mg(x)  , 


M = M+A-B  , 


A = 


T,  T 

zz  /y  z , 


B = 


/\  Ta 

Myy  M/y  My  , 


y = g(x)-g(x)  , 


z = x-x 


For  the  first  iteration,  one  sets 


M = I , 


(50-1) 


(50-2) 


(50-3) 


(50-4) 


(50-5) 


(50-6) 


(50-7) 
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.1 
.] 

where  I is  the  identity  matrix  of  order  n.  J 

For  the  QL  algorithm,  the  computation  of  the  search 
direction  is  described  by  | 


H(x)p  = g(x)  . (51) 

The  solution  of  the  above  linear  system  is  done  by  Gaussian 
elimination  without  pivoting. 

Search  Technique.  For  the  FR  and  DFP  algorithms,  one- 
step  cubic  interpolation  is  employed  in  order  to  determine 
the  optimum  stepsize  . In  the  cubic  interpolation  process, 
two  ordinates  and  two  slopes  are  employed.  Therefore,  it  is 
assumed  that  one  iteration  of  each  of  these  algorithms  re- 
quires two  function  evaluations  and  two  gradient  evaluations. 

The  details  are  given  below. 

Let  f(a)  and  denote  the  functions 

f(a)  = f(i)  = f(x-ap)  , (52-1) 

f^(a)  = g'^(x)p  = g'^(x-ap)p  . (52-2) 

Suppose  that  two  stepsizes  and  02  have  been  found  such 
that 


f^(aj^)  < 0 , ° 


I 
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Then,  the  optimal  steps! ze  is  determined  through  the 
relation  (Ref.  6) 


where 


with 


“o  = ' 


(54) 


= (l/3q2)  Q -q2+»'(q2-3q2<33)H 


(55) 


qi  = («2““l^ ' 


(56-1) 


q2  = 3 Q f (a2)-f  (cij^)|]  - (a2-aj^)  2f^  (a^^) +f ^ (02)  H , (56-2) 


qg  = 2Q  f (aj^)-f  (02)!!  +(a2““i)  C ^®1^ '’’^a  ^“2^  ^ 


In  the  sequel,  we  assume  that  the  stepsizes 


= 0 , 02  = 1 


are  such  that  Ineqs.  (53)  are  satisfied. 
For  the  QL  algorithm,  the  stepsize 

0 = 1 


(57) 


(58) 


T 
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9.  Experimental  Determination  of  the  Coefficients 

In  this  section, we  describe  the  technique  employed  to 
determine  the  coefficients  , C^i  experimentally. 

Coefficients  and  C2.  These  coefficients  can  be 
computed  with  the  aid  of  Eqs.  (28-1)  and  (28-2)  as  follows; 

Cl  = , C2  = ^2/^0  • <59) 

For  the  basic  times  Tq,  to  be  sufficiently  precise, 

it  is  necessary  that  the  computation  or  i>x),  g(x),  H(x)  be 
repeated  a large  number  of  times  at  the  same  nominal  point  x 
(for  instance,  1000  times) . 

With  the  above  considerations  in  mind,  the  basic  times 
^0'  ^1'  ^2  determined  as  follows;  (i)  by  evaluating 

separately  the  function,  the  gradient,  and  the  Hessian 
1000  times;  (ii)  by  monitoring  the  associated  CPU  times, 
which  include  the  do-loop  time;  (iii)  by  determining  sepa- 
rately the  do-loop  time;  and  (iv)  by  subtracting  the  do- 
loop  time  from  the  experimentally  determined  CPU  times. 

With  the  basic  times  Tq,  Tj^,  T2  known,  the  coefficients 
Cj^  and  C2  can  be  computed  with  (59)  . Then,  the  coefficients 
and  K2  can  be  determined  with  (31-1)  and  (31-2). 

Coefficient  C^.  This  coefficient  can  be  computed  with 
the  aid  of  Eqs.  (5),  (6),  and  (28-3)  as  follows; 


34 


AAR-134 


C3  = (T-Tq-T3^-T2)/(Tq+Tj^+T2)  . 


Next,  we  invoke  Eqs.  (7)  and  the  definition 


(60) 


T = tN  , (61) 

where  N denotes  the  number  of  iterations  and  t denotes  the 
time  required  to  perform  one  iteration  (this  includes  both 
the  algorithmic  time  and  the  function  evaluation  time) . In 
the  light  of  (7)  and  (61)  , Eq.  (60)  can  be  rewritten  as 


C3  ^^*^“’’-0^0”^l^l”^2^2^^  ^^0*^0''’^l^l''’^2^2^  ' 


Examination  of  one  iteration  of  the  algorithms  under 
consideration  shows  that,  for  the  search  conditions  assumed. 


Nq/N  = 2 , N^/N  = 2 , N2/N  = 0 (63) 

for  the  FR  and  DFP  algorithms  and  that 

Nq/N  = 1 , Nj^/N  = 1 , N2/N  = 1 ' (64) 

for  the  QL  algorithm.  As  a consequence,  Eq.  (62)  simplifies  to 

C3  = (t-2Tq-2t3^)/(2Tq+2t3^)  (65) 


for  the  FR  and  DFP  algorithms,  and  to 


1 


— 
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C3  = (t-Tq-t^-T2)/(Tq+t^+T2)  (66) 

for  the  QL  algorithm.  For  the  basic  time  t to  be  sufficiently 
precise,  it  is  necessary  that  one  iteration  of  each  of  the 
algorithms  under  consideration  be  repeated  a large  number  of 
times  at  the  same  nominal  point  x (for  instance,  1000  times) . 

With  the  above  considerations  in  mind,  the  basic  time  t 
was  determined  as  follows:  (i)  by  executing  one  iteration  of 
each  of  the  algorithms  under  consideration  1000  times;  (ii)  by 
monitoring  the  associated  CPU  times,  which  include  the  do- 
loop  time;  (iii)  by  determining  separately  the  do-loop  time; 
and  (iv)  by  subtracting  the  do-loop  time  from  the  experi- 
mentally determined  CPU  times. 

With  the  basic  time  x Icnown,  and  with  Tq,  t^,  T2  also 
Icnown,  the  coefficient  C^  can  be  computed  with  (65)  for  the 
FR  and  DFP  algorithms  and  with  (66)  for  the  QL  algorithm. 

Then,  the  coefficient  can  be  determined  with  (31-3) . 


nattOiailiiiii* 
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10.  Numerical  Results 

Careful  numerical  experiments  were  performed  along  the 
lines  outlined  in  Sections  7-9,  and  the  results  are  given  in 
Tables  2-5.  Tables  2-3  give  the  coefficients  C2  and 
K2  for  the  test  functions  (34)- (47).  Tables  4-5  give  the 
coefficient  for  the  test  functions  (34) -(47)  and  the  FR, 
DFP,  and  QL  algorithms.  The  coefficient  is  not  given, 
since  it  can  be  computed  with  the  simple  relation  (31-3) . 

Coefficient  C^^.  Table  2 shows  that  the  coefficient  C^^ 
for  the  Rosenbrock  function  has  the  value  0.90  (instead  of 
n=2),  and  the  coefficient  C^  for  the  Powell  function  has  the 
value  1.17  (instead  of  n=4) . Thereby,  use  of  the  standard 
definition  (11)  of  equivalent  number  of  function  evaluations 

overestimates  the  effort  associated  with  gradient  computation 
by  a factor  of  2.2  for  the  Rosenbrock  function  and  by  a factor 

of  3.4  for  the  Powell  function. 

With  reference  to  the  generalized  Rosenbrock  function. 
Table  3 shows  that,  for  the  case  n=5,  the  coefficient  C^^  has 
the  value  1.05  (instead  of  5);  for  the  case  n=30,  the  co- 
efficient Cj^  has  the  value  1.11  (instead  of  30).  Thereby, 
use  of  the  standard  definition  (11)  of  equivalent  number  of 
function  evaluations  overestimates  the  effort  associated  with 
gradient  computation  by  a factor  of  4.8  for  n=5  variables  and 
by  a factor  of  27  for  n=30  variables. 
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In  conclusion,  under  the  hypothesis  that  analytical 
expressions  for  the  components  of  the  gradient  are  available, 
and  for  Examples  7.1  through  7.14,  it  does  not  appear  that 
Assumption  (Al)  is  satisfied. 

Coefficient  €2-  Table  2 shows  that  the  coefficient  C2 
for  the  Rosenbrock  function  has  the  value  1.03  (instead  of 
m=3) , and  the  coefficient  C2  for  the  Powell  function  has  the 
value  0.92  (instead  of  m=10) . Thereby,  use  of  the  standard 
definition  (11)  of  equivalent  number  of  function  evaluations 
^ overestimates  the  effort  associated  with  Hessian  computation 

by  a factor  of  2.9  for  the  Rosenbrock  function  and  by  a factor 
' of  11  for  the  Powell  function. 

With  reference  to  the  generalized  Rosenbrock  function, 
Table  3 shows  that,  for  the  case  n=5,  the  coefficient  C2  has 
the  value  1.47  (instead  of  m=15) ; for  the  case  n=30,  the  co- 
efficient C2  has  the  value  3.68  (instead  of  m=465) . Thereby, 
use  of  the  standard  definition  (11)  of  equivalent  number  of 
function  evaluations  overestimates  the  effort  associated  with 
Hessian  computation  by  a factor  of  10  for  n=5  variables  and 
by  a factor  of  126  for  n=30  variables. 

In  conclusion,  under  the  hypothesis  that  analytical 
expressions  for  the  elements  of  the  Hessian  matrix  are 
available,  and  for  Examples  7.1  through  7.14,  it  does  not 
appear  that  Assumption  (A2)  is  satisfied. 
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Coefficient  C^.  Inspection  of  Tables  4-5  shows  that  the 
coefficient  is  never  negligible  by  comparison  with  1, 
meaning  that  the  algorithmic  time  is  never  negligible  by 
comparison  with  the  function  evaluation  time.  Indeed,  in 
many  cases,  can  be  larger  than  1,  meaning  that  the  al- 
gorithmic time  can  be  larger  than  the  function  evaluation 
time. 

For  example,  consider  the  Wood  function.  Table  4 shows 
that  the  coefficient  has  the  value  0.87  for  the  FR  al- 
gorithm, 2.88  for  the  DFP  algorithm,  and  2.12  for  the  QL 
algorithm. 

As  another  example,  consider  the  generalized  Rosenbrock 
function.  Table  5 shows  that,  for  values  of  n ranging  be- 
tween 5 and  30,  the  coefficient  ranges  between  0.55  and 
0.86  for  the  FR  algorithm,  between  3.33  and  12.92  for  the 
DFP  algorithm,  and  between  2.24  and  23.85  for  the  QL  algo- 
rithm. 

In  conclusion,  under  the  hypothesis  that  analytical  ex- 
pressions for  the  components  of  the  gradient  and  the  ele- 
ments of  the  Hessian  matrix  are  available,  and  for  Examples 
7.1  through  7.14,  it  does  not  appear  that  Assumption  (A3)  is 
satisfied  for  the  FR,  DFP,  and  QL  algorithms. 

Comments.  With  reference  to  the  previous  results,  the 
following  comments  are  pertinent. 
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(i)  In  devising  the  subroutines  necessary  to  the  com- 
putation of  the  functions  (34) -(47)  and  their  first  and 
second  derivatives,  an  effort  was  made  to  render  the  CPU 
time  as  small  as  possible.  The  commonality  existing  between 
the  components  of  the  gradient  was  exploited.  The  common- 
ality existing  between  the  elements  of  the  Hessian  matrix 
being  computed  was  also  exploited.  Finally,  the  fact  that 
the  Hessian  matrix  might  have  many  elements  which  vanish  was 
taken  into  consideration. 

(ii)  For  the  subroutines  mentioned  in  (i) , the  computa- 
tion of  the  function,  the  computation  of  the  gradient,  and 
the  computation  of  the  Hessian  matrix  were  conceived  to  be 
separate  operations.  No  attempt  was  made  to  exploit  the 
commonality  between  function  and  gradient  nor  the  commonality 
between  function,  gradient,  and  Hessian  matrix.  In  other 
words,  a worst-case  situation  was  postulated,  and  conserva- 

5 

tive  estimates  of  C2,  were  arrived  at.  Had  the  above 
commonality  been  exploited,  the  values  of  C2  would  have 
been  smaller  than  those  given  in  Tables  2-3  and  the  values 
of  would  have  been  larger  than  those  given  in  Tables  4-5. 


Here,  the  adjective  conservative  is  employed  in  the  sense 
of  "favorable"  to  the  old  definition  (11)  of  equivalent 
number  of  function  evaluations. 
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(iii)  For  the  generalized  Rosenbrock  function.  Table  5 i 

illustrates  in  a striking  way  the  effect  of  the  size  of  the 

i 

problem  on  the  relative  importance  of  algorithmic  time  vis-a- 
vis  function  evaluation  time.  As  n increases,  tends  to  be 
constant  for  the  FR  algorithm;  it  increases  at  a linear  rate 
for  the  DFP  algorithm;  and  it  increases  at  a faster-than- 
linear  rate  for  the  QL  algorithm. 

The  explanation  for  this  result  is  simple.  As  n increases, 
the  function  evaluation  time  increases  linearly  for 

all  of  the  algorithms  under  consideration.^  ‘ 

On  the  other  hand,  as  n increases,  the  algorithmic  time  in- 
creases linearly  for  the  FR  algorithm,  quadratically  for  the 
DFP  algorithm,  and  cubically  for  the  QL  algorithm. 

(iv)  The  coefficients  and  C2  depend  mostly  on  the 
nature  of  the  function  f (x) . On  the  other  hand,  the  coef- 
ficient for  the  FR  and  DFP  algorithms  depends  also  on  the 
search  technique  employed  to  determine  the  stepsize  a;  should 
a different  search  technique  be  employed,  the  value  of 
would  change.  By  the  same  token,  the  coefficient  for 

QL  algorithm  depends  on  the  subroutine  used  to  solve  the 
linear  system  governing  the  components  of  the  search  direction; 
should  a different  subroutine  be  employed,  the  value  of 
would  change. 


For  the  QL  algorithm,  the  linear  increase  of  the  function  eval- 
uation time  is  due  to  the  tridiagonal  structure  of  the  Hessian 
matrix  associated  with  the  generalized  Rosenbrock  function. 
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Table  2.  Results  for  Examples  7.1  through  7.12. 


Example 

n 

m 

^1 

‘^2 

7.1 

2 

3 

0.90 

1.03 

0.45 

0.34 

7.2 

2 

3 

1.41 

1.34 

0.71 

0.45 

7.3 

2 

3 

1.42 

1.85 

0.71 

0.62 

7.4 

3 

6 

1.10 

1.45 

0.37 

0.24 

7.5 

3 

6 

1.18 

1.63 

0.39 

0.27 

7.6 

3 

6 

1.13 

1.69 

0.38 

0.28 

7.7 

4 

10 

1.06 

0.83 

0.26 

0.08 

7.8 

4 

10 

1.10 

2.02 

0.27 

0.20 

7.9 

4 

10 

1.27 

1.32 

0.32 

0.13 

7.10 

4 

10 

1.17 

0.92 

0.29 

0.09 

7.11 

4 

10 

1.50 

1.50 

0.38 

0.15 

7.12 

4 

10 

1.13 

0.32 

0.28 

0.03 

I 

1 
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Table 

3.  Results  for 

Examples 

7.13  and 

7.14. 

Example 

n 

m 

^1 

^1 

^2 

7.13 

5 

15 

1.90 

2.94 

0.38 

0.20 

7.13 

10 

55 

1.99 

4.89 

0.20 

0.09 

7.13 

15 

120 

2.05 

7.05 

0.14 

0.06 

7.13 

20 

210 

2.18 

9.36 

0.11 

0.04 

7.13 

25 

325 

2.17 

11.78 

0.09 

0.04 

7.13 

30 

465 

2.06 

12.92 

0.07 

0.03 

7.14 

5 

15 

1.05 

1.47 

0.21 

0.10 

7.14 

10 

55 

1.09 

1.86 

0.11 

0.03 

7.14 

15 

120 

1.09 

2.38 

0.07 

0.02 

7.14 

20 

210 

1.12 

2.81 

0.06 

0.01 

7.14 

25 

325 

1.09 

3.11 

0.04 

<0.01 

7.14 

30 

465 

1.11 

3.68 

0.04 

<0.01 

I 


I 

I 


Table  4 . 

Results 

for 

Examples  7 . 1 

through 

7.12. 

Algorithm 

FR 

DFP 

QL 

Example 

n 

m 

S 

S 

S 

7.1 

2 

3 

1.65 

2.51 

1.17 

7.2 

2 

3 

1.15 

1.97 

0.66 

7.3 

2 

3 

0.65 

1.41 

0.51 

7.4 

3 

6 

0.27 

0.77 

0.37 

7.5 

3 

6 

0.28 

0.78 

0.37 

7.6 

3 

6 

0.30 

0.77 

0.40 

7.7 

4 

10 

0.87 

2.88 

2.12 

7.8 

4 

10 

0.37 

1.00 

0.64 

7.9 

4 

10 

1.02 

3.31 

1.94 

7.10 

4 

10 

1.13 

3.42 

2.14 

7.11 

4 

10 

0.36 

1.04 

0.57 

7.12 

4 

10 

0.99 

3.08 

2.58 

i 


J 


-^v. 
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Table  5 

. Results  for  Examples 

7.13  and  7.14. 

Algorithm 

FR 

DFP 

QL 

Example 

n 

m 

S 

7.13 

5 

15 

0.48 

1.80 

1.03 

7.13 

10 

55 

0.42 

3.14 

2.26 

7.13 

15 

120 

0.41 

4.27 

3.76 

7.13 

20 

210 

0.37 

5.72 

4.82 

7.13 

25 

325 

0.39 

6.90 

6.59 

7.13 

30 

465 

0.41 

8.06 

8.28 

7.14 

5 

15 

0.86 

3.33 

2.24 

7.14 

10 

55 

0.72 

4.92 

5.08 

7.14 

15 

120 

0.58 

6.93 

8.82 

7.14 

20 

210 

0.66 

8.82 

13.15 

7.14 

25 

325 

0.55 

10.89 

18.96 

7.14 

30 

465 

0.60 

12.92 

23.85 
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11.  Conclusions  and  Recommendations 

In  this  paper,  we  have  considered  the  comparative 
evaluation  of  algorithms  for  mathematical  programming 
problems  from  the  point  of  view  of  computational  speed.  We 
have  examined  critically  the  indirect  measurement  of  com- 
putational speed  through  the  equivalent  number  of  function 
evaluations  N^,  and  we  have  found  that  the  concept, 
while  accurate  in  some  cases,  has  drawbacks  in  other  cases. 
Indeed,  it  might  lead  to  a distorted  view  of  the  relative 
importance  of  an  algorithm  with  respect  to  another. 

In  an  effort  to  correct  the  above  distortion,  we  have 
imbedded  the  parameter  into  a more  general  parameter 
Ng  , which  is  constructed  so  as  to  reflect  accurately  the 
computational  effort  associated  with  function  evaluations 
and  algorithmic  operations.  This  new  parameter  in- 
cludes coefficients  C2»  which  can  be  determined 
either  experimentally  or  through  an  operational  count.  When 
the  triplet  (C^^,  C2»  C^)  takes  on  the  values  (n,m,0),  the 
new  parameter  N reduces  to  the  old  parameter  N . 

We  have  determined  experimentally  the  coefficients 
C2,  for  fourteen  test  functions  and  three  minimization 
algorithms.  And  we  have  found  that  the  deviations  of  these 
coefficients  from  the  idealized  values  n,m,0  can  be  sub- 


stantial . 
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More  specifically,  the  experimental  values  found  for 
are  1.4  to  27  times  smaller  than  the  idealized  value  n. 

The  experimental  values  found  for  C2  are  1.6  to  126  times 
smaller  than  the  idealized  value  m.  And  the  experimental 
values  found  for  are  never  negligible  with  respect  to 
1;  they  are  of  order  1 for  the  FR  algorithm,  and  of  order  1 
to  10  for  the  DFP  and  QL  algorithms. 

Obviously,  the  experimental  values  of  Cj^,  C2,  are 
subject  to  errors  due,  among  other  things,  to  the  multi- 
programming and  time-sharing  capabilities  of  the  IBM  370/155 
computer  of  Rice  University.  However,  the  basic  fact  re- 
mains that  the  deviations  detected  for  C^,  C2,  from  the 
idealized  values  n,m,0  are  so  large  that  the  use  of  the 
concept  is  open  to  serious  question. 

From  the  analyses  performed  and  the  results  obtained,  it 
is  inferred  that,  due  to  the  weakness  of  the  concept,  the 
use  of  the  concept  is  advisable  as  a means  for  comparing 
different  algorithms  from  the  point  of  view  of  computational 
speed.  In  effect,  this  is  the  same  as  stating  that,  in  spite 
of  its  obvious  shortcomings,  the  direct  measurement  of  the 
CPU  time  is  still  the  most  reliable  way  of  comparing  different 
minimization  algorithms. 

However,  for  the  direct  measurement  of  the  CPU  time  to 
be  really  meaningful,  provisions  similar  to  those  implemented 


47 


AAR-134 


! 


in  Refs.  1-2  should  be  employed.  That  is,  it  is  necessary 
that  the  comparison  of  different  algorithms  be  done  on  a 
single  computer,  with  the  seune  progreunming  language,  with 
the  saune  compiler,  with  the  same  subroutines,  under  similar 
workload  conditions  of  the  computer,  and  by  the  same  pro- 
grammer . 

In  closing,  these  authors  stress  that  the  conclusions  of 
this  paper  should  not  be  interpreted  as  an  invitation  to 
other  authors  to  disregard  reporting  on  number  of  iterations 
N,  number  of  function  evaluations  Np,  number  of  gradient 
evaluations  and  number  of  Hessian  evaluations  N2.  By 
all  means,  these  are  useful  quantities,  which  should  be 
reported  because  their  knowledge  does  shed  some  light  on  the 
comparative  behavior  of  different  algorithms.  Nevertheless, 
none  of  the  methods  for  the  indirect  measurement  of  the 
computational  speed  is  truly  satisfactory,  and  these  authors 
feel  that  there  exists  no  reliable  alternative  to  the  direct 
measurement  of  the  CPU  time. 
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